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Abstract 

We present a mathematical analysis of the effects of Hebbian learning in random 
recurrent neural networks, with a generic Hebbian learning rule including passive 
forgetting and different time scales for neuronal activity and learning dynamics. 
Previous numerical works have reported that Hebbian learning drives the system 
from chaos to a steady state through a sequence of bifurcations. Here, we interpret 
these results mathematically and show that these effects, involving a complex cou- 
pling between neuronal dynamics and synaptic graph structure, can be analyzed 
using Jacobian matrices, which introduce both a structural and a dynamical point 
of view on the neural network evolution. Furthermore, we show that the sensitiv- 
ity to a learned pattern is maximal when the largest Lyapunov exponent is close 
to 0. We discuss how neural networks may take advantage of this regime of high 
functional interest. 



* Corresponding author, hugues.berry@inria.fr 



I. INTRODUCTION 



The mathematical study of the effects of synaptic plasticity (or more generally learning) 
in neural networks is a difficult task because the dynamics of the neurons depends on the 
synaptic weights network, that itself evolves non trivially under the influence of neuron dy- 
namics. Understanding this mutual coupling (and its effects on the computational efficiency 
of the neural network) is a key problem in computational neuroscience and necessitates new 
analytical approaches. 

In recent years, the related field of dynamical systems interacting on complex networks has 
attracted vast interest. Most studies have focused on the influence of network structure on 



the global dynamics (for a review, see (IBoccaletti et ali l2006l )). In particular, much effort 



has been devoted to the relationships between node synchronization and the classical sta- 
tistical quantifiers of complex network s (degree distribution, average clustering index, mean 



shortest path, motifs , mod ularity...) flGrinstein and Linskerl . 



2005 



200 



Nishikawa et al. 



Lago-Fernandez et al. 



20031 ). The core idea was that the impact of network topology on 



global dynamics might be prominent, so that these structural statistics may be good in- 
dicators of global dynamics. This assumption p roved however largely wrong and some o f 



the related studies yielded contradictory results (IHong et al. 



2002 



Nishikawa et al. 



20031 ). 



Actually, synchronization properties cannot be systematically deduced from topo logy statis- 
tics but may be inferred from the spectrum of the network (lAtav et all 120061 ) . Mo st of 
these studies have considered diffusive coupling between the nodes ( iHasegawal . l2005l ). In 
this case, the adjacency matrix has real nonnegative eigenvalues, and global properties, such 



as stability of the synchronized states 



from its spectral p roperties (see also (lAtay. et al 



(IBarahona and Pecora 



200 



20021) can easily be in 



Volchenkov and Blanchard 



in 


"erred 


3. 


2007) 



and (jChungl . 119971 ) for a review on mathematically rigorous results). Unfortunately, the 
coupling between neurons (synaptic weights) in neural networks is rarely diffusive, the cor- 
responding matrix is not symmetric and may contain positive and negative elements. In 
addition, the synaptic graph structure of a neural network is usually not fixed but evolves 
with time, which adds another level of complexity. Hence, these results are not directly 
applicable to neural networks. 

Discrete-time random recurrent neural networks (RRNNs) are known to display a rich vari- 
ety of dynamical behaviors, including fixed points, limit cycle oscillations, quasi periodicity 
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and deterministic chaos tooyon et al. 



19931 ). The effect of hebbian learning in RRNN, 



includi ng pattern r e trieva l properties, has been explored numerically by Dauce and some 



of us (IDauce et al. 



19981 ). It was observed that Hebbian learning leads to a systematic 



reduction of the dynamics complexity (transition from chaos to fixed point by an inverse 
quasi-periodicity route). This property has been exploited for pattern retrieval. After a suit- 
able learning phase the presentation of a learned pattern induces a bifurcation (e.g. from 
chaos to a simpler attractor such as a limit cycle). This effect is inherited via learning (it 
does not exist before learning), is robust to a small amount of noise, and selective (it does 
not occur for drastically different patterns) . These effects were however neither analyzed 



nor really understood in (Dauce et ali 



1998 



and expoited on a robotic platform in (IDauce et al. 



). This work w as extended to sequence learning 



2002h. 



More recently. Echo State Networks (ESN) (jjaeger and Haasl . 1200J) have been developed. 



where, as in our case, the network acts as a reservoir of resonant frequencies. However, 
learning only affects output links in ESN networks, while the weights within the reservoir 
are kept constant. Tsuda's ch aotic itineran cy is an alternative way for linking different at- 
tractors with different inputs (ITsudal . l200ll ). In this model, weights are initially fixed in a 
Hopfield-like manner (and are thus symmetric) and a chaotic dynamics successively explores 
the different fixed point attractors. In this scheme, each input constitutes an d ifferent initial 



condition that leads to one attractor of the same dynamical system, whereas in (IDauce et al. 



19981 ). each (time-constant) input leads to a different dynaimcal system. 



In the current state of the art, there is a relatively large number of models, observations and 
applications of Hebbian learning effects in neural networks, but considerably less mathe- 
matical results. Mathematical analysis is however necessary to classify the many variants of 
Hebbian learning rules according to the effects they produce. The present paper is one step 



19971 ) on the neural network model numerically studied in (iDauce et al. 



(HoDDenstea 


dt and Izhikevich. 


(Dauce et al.. 


1998 


) with spon- 



taneous (i.e. before learning) chaotic dynamics. 
We essentially classify the effects into three families: 

(i) Topological: the structure of the synaptic weight network evolves, implying prominent 
(e.g. cooperative) effects on the dynamics. 

(ii) Dynamical: the dynamical complexity (measured e.g. by the maximal Lyapunov ex- 
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ponent or the Kolmogorov-Sinai entropy) reduces during Hebbian learning. This effect is 
mathematically analyzed and interpreted. Especially, we provide a rigorous upper bound 
on the maximal Lyapunov exponent and identify two major causes for this reduction: the 
decay of the norm of the synaptic weight matrix and the saturation of neurons, 
(iii) Functional: Focusing on the network response to a learned pattern, we show that there 
is a learning stage at which the response is maximal, in the sense that it generates a drastic 
change of the neuronal dynamics (i.e. a bifurcation). This stage precisely corresponds to 
vanishing of the maximal Lyapunov exponent. 

Some of these results may appear neither "new" nor "surprising" for the neural network s 
community. For example, (ii) and (iii) have already been reported in (iDauce et all Il998l ). 
However, the results were mainly numerical while the present paper proposes a mathemat- 
ical framework and formal tools to analyze them. Moreover, a direct consequence of (iii) 
is that the response of the neural network to a learned pattern is maximal at the "edge of 
chaos" (where the maximal Lyapunov exponent vanishes). 

The claim that the neural netwo rk response is m aximal close to a bifurcation is common in 
the n eural network community (ILangtonl . Il990l ). Similarly, (IHoppensteadt and Izhikevichl . 
19971 ) already pointed out the necessity for some neurons to lie close to a bifurcation point 
in order to have relevant computational capacities. As a matter of fact, an analysis of the 
effects of a Hopfield-Hebb rule was performed in this book with neurons close to codimension 
one fixed-point bifurcations. 

We go a step further in the present paper and show that a sim ilar conclusion holds for a neu 



ral ne twork in a chaotic regime. Conceptually, the analvsis of flHoDPensteadt 



and Izhikevichl . 



19971 ) could be extended to chaotic systems ^ ( ICessac and SamuelideJ . 20071 ). However, the 
analytic treatment of the chaotic case is really challenging. Hence, bifurcation analysis of 
fixed points (or periodic orbits) uses a linear analysis via Jacobian matrices, which is usually 



^ A cornerstone of the analysis in ijHoppensteadt and Izhikevichl . 119971 ) is the use of Hartman-Grobman 
theorem, and its consequence, namely that neural networks have non trivial properties only if some 
neurons are close to a bifurcation point. In some sense, this analysis can be extended to uniformly 
hyperbolic dynamical systems, a small subset of chaotic systems (though it has never been done). In 
addition, it is absolutely not guaranteed that chaotic RRNNs are uniformly hyperbolic, since one does not 
control the spectrum of the Jacobian matrices. The main difficulty is to characterize this spectrum on the 
w-limit set (and not in the whole phase space). As a matter of fact, we do not know of any mathematical 
result with regard to this aspect. 
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considered non-applicable to chaotic systems where nonlinear effec ts and i nitial conditions 



sensitivity are prominent. Nevertheless, recent results by Ru elle (iRuelld. 



1999) on 



respo nse theory, formally extended to chaotic neural networks (jCessac and Sepulchre 



mear 



2006 . 



20071 ). show that a linear analysis is indeed possible if one uses an average of the Jacobian 



matrix along its chaotic trajectory. The associated linear response operator provides a deep 
insight into the links between topology and dynamics in chaotic neural networks. Inciden- 
tally, it shows that the relevant matrix is not the weight matrix (as would be expected), but 
the linear response matrix, which reduces, in the present context, to the ergodic average of 
the Jacobian matrix along its trajectory ^. 

Though the main results in this paper are mathematical, we also use some numerical simu- 
lations. They were necessary because mathematical results are obtained using a limit where 
time goes to infinity, which is not operational in numerical situations. Moreover, the central 
rigorous results we obtain provide upper bounds, whose quality had to be checked numeri- 
cally. 

The paper is organized as follows. We first present the model and the generic framework 
for neuronal dynamics and learning rules in section [Tll The following sections are devoted 
to the analysis of the model. In section llllt we present analytical results explaining the evo- 
lution of dynamics during learning using mathematical tools from dynamical systems and 
graph theory. These analytical results are confirmed by extensive numerical simulations. 
Section [IV] focuses on functional effects related to network sensitivity to the learned pattern. 
We finally discuss our results in the last section (jVl) . 



II. GENERAL FRAMEWORK 



A. Model description 

We consider firing-rate recurrent neural networks with point neurons and discrete- 
time dynamics, where learning may occur on a different (slower) time scale than neuron 
dynamics. Synaptic weights are thus constant for r > 1 consecutive dynamics steps, which 
defines a "learning epoch" . The weights are then updated and a new learning epoch begins. 

^ This result, which may a posteriori appear obvious to readers famihar with dynamical systems theory is 
in fact highly non trivial and requires Ruelle's linear response theory to be properly justified. 
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We denote by t the update index of neuron states (neuron dynamics) inside a learning 
epoch, while T indicates the update index of synaptic weights (learning dynamics). Call 
x'^^\t) G [0,1] the mean firing rate of neuron at time t within the learning epoch T. 



Set x(^)(t) 



it) 



N 



e [0, 1]^. Denote by F the function F : ^ R'" such that 



J i=l 



Fj(x) = f{xi) where / is a sigmoidal transfer function (e.g. f{x) = (1 + ta,nh{gx)/ 2)). Let 
W*-^-' be the matrix of synaptic weights at the T-th learning epoch. Then the discrete time 
neuron dynamics writes: 



x(^)(t + 1) = F [u(^)(t)] = F [W(^)x(^)(t) + ^] 



(1) 



'{t) is called "the local field (or the synaptic potential), at neuron time t and learning 
epoch T" . The output gain g tunes the nonlinearity of the function and mimics the reactiv- 
ity of the neuron. The vector ^ = (^i)^^ is the "pattern" to be learned. The initial weight 
matrix W^^^ is randomly and independently sampled from a Gaussian law with mean and 
variance Hence, the synaptic weights matrix W*-^-' = [W^p] typically contains 

V / i,j=l 

positive (excitation), negative (inhibition) or null (no synapse) elements and is asymmetric 

The network can display different dyn amical regirnes (ch aos, (quasi-) periodicity, fixed 
point), depending on these parameters (jPauce et al.i Il998l ). In the present study, the pa- 
rameters were set so that the spontaneous dynamics (i.e. the network dynamics at T = 1 ) 
was chaotic. At the end of every learning epoch, the neuron dynamics indices are reset, and 



X, 



(T+l) 



(0) 



x\ '{T),Wi. 



The learning rules we study conform to Hebb's postulate (IHebbl. Il948l). S pecificallv. we 



define the following generic formulation (IHoppensteadt and Izhikevich 



19971): 



N 



(2) 



where a is the learning rate and T^'^^ a Hebbian function (see below). The first term in the 
right-hand side (RHS) member accounts for passive forgetting, i.e. A G [0, 1] is the forgetting 
rate. If A < 1 and Tij = (i.e. both pre- and postsynaptic neurons are silent, see below), eq. 
([2]) leads to an exponential decay of the synaptic weights (hence passive forgetting), with a 



characteristic rate 



log(A)| 



[see discussion, section |V]). Note that there is no forgetting when 



A = 1. The second term in the RHS member generically accounts for activity-dependent 
plasticity, i.e. the effects of the pre- and postsynaptic neuron firing rates. We focus here on 



learning rules where this term depends on the history of activities^, i.e. 

rSf = Mxf\xf ) (3) 

where x-"^^ = is the trajectory of neuron i firing rate. In the present paper, as 

(T) 

a simple example, we shall associate to the history of neuron i rate an activity index m- . 

mP = -j2i^P{t)-d,) (4) 

r ^ — ' 

t=i 

where di G [0, 1] is a threshold and /i is a function of m] and rrij . 

(T) 

The neuron is considered active during learning epoch T whenever ml > 0, and silent 
otherwise, di does not need to be explicitly defined in the mathematical study. In numeri- 
cal simulations however, we set it to 0.50, Vi. Definition (jl]) actually encompasses several 
cases. If r = 1, weight changes depend only on the instantaneous firing rates, while if r ^ 1, 
weight changes depend on the mean value of the firing rate, averaged over a time window 
of duration r in the learning epoch. In many aspects the for mer case can be considere d as 



plasticity, while the latter may be related to meta-plasticity (lAbraham and Bearl . Il996l ). In 
this paper, we set r ^ oo for the mathematical analysis. We chose a value of r = 10^ in 
numerical simulations, which corresponds to the time scale ratio between neuronal dynam- 



ics (ms) and synaptic plasticity (10 s) (see (jPelord et all 120071 )). Importantly, note that 



other values of r (including r = 1) have been tested in simulations and did not lead to 
any qualitative change in the network behavior, although some integration lag effects were 
observed for very small values. Therefore, the exact value of r has no impact on the major 
conclusions of the present paper. 

The explicit definition of the function h in eq.(l3]) is constrained by Hebb's postulate 
for plasticity. This postulate is somewhat loosely defined, so that many implemen- 



tations are possible in our 



( iHoppensteadt and Izhikevichl . 119971 ): 



ramework. Our choice is guided by the following points 



As a matter of fact, note that F-J^ is a function of the trajectories i -"^^ , i^"^' , which depend on W^^^ 

(T — l) 

which in turn depends on F)^ ' ... Hence, the set of synaptic weights at time T +1 and the dynamics of 
the corresponding neurons arc functions of the whole history of the system. In this respect, we address 
a very untypical and complex type of dynamical systems where the flow at time t is a function of the 
past trajectory and not only a function of the previous state. (In the context of stochastic processes, such 
systems are called "chains with complete connections" by opposition to (generalized) Markov processes). 
This induces rich properties such as a wide learning-induced variability in the network response to a given 
stimulus, with the same set of initial synaptic weights, simply by changing the initial conditions. 
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1. h > whenever post-synaptic (i) and pre-synaptic (j) neurons are active, as in long- 
term potentiation (LTP). 

2. h < whenever i is inactive and j is active, corresponding to homosynaptic long-term 
depression (LTD). 

3. h = whenever j is inactive. This poin t is often considered as a corollary to Hebb's 



rule (IHoppensteadt and Izhikevich 



19971 ). Moreover, it renders the learning rule asym- 



metric and excludes the possibility that dynamics changes induced by learning could 
be due to weight symmetrization. This hy pothesis however formally excludes het- 
erosynaptic LTD (IBear and Abraham! . Il996l ). which would correspond to h < for i 
active and j inactive. However, most of the results presented herein remain valid in 
the presence of heterosynaptic LTD (see section |V] for a discussion) . 

Although these settings are sufficient for mathematical analysis, h has to be more precisely 
defined for numerical simulations. Hence, for the simulations, we set an explicit implemen- 
tation of r*-"^-* such that : 



(5) 



where m*^-^) 



(T) 



N 



J i=l 



Him, 



N 



i=l 



H{x) is the Heaviside function, H(ui^^^) = 
m'^"^^i7(m*^-^^) is the vector of components mf^^ H{mf"^) and + denotes the transpose. Fi- 

(T) 

nally, in the simulations, we forbid weights to change their sign, and self-connections W^^ 
stay to (note however that these settings do not infiuence qualitatively the results pre- 
sented here). 

For the purpose of the present paper, the exact value of this input pattern ^ is not very im- 
portant, as soon as its maximal amplitude remains small with respect to the neuron maximal 
firing rate. Here, we used C,i = 0.010 sin {2TTi/N) cos (Siri/N) , = 1 . . . in all numerical 
simulations. The main rationale for this choice is that this pattern is easily identified by eyes 
when the ^jS are plotted against i, which is particularly helpful when interpreting alignment 
results, such as in fig. [3l 

Equations ([I]) & (jSl) define a dynamical system where two distinct processes (neuron dy- 
namics and synaptic network evolution) interact with distinct time scales. This results in a 
complex interwoven evolution where neuronal dynamics depends on the synaptic structure 
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and synapses evolve according to neuron activity. On general grounds, this process has a 
memory that is a priori infinite and the state of the neural network depends on the past 
history. 



B. Analysis tools 

One possible approach to topology and dynamics interactions in neural networks consists 
in searching structural cues in the synaptic weight matrix that may be informative of specific 
dynamical regimes. The weight matrix is expected to carry information about the functional 
network. However, it can be easily shown that the synaptic weight matrix is not sufficient 
to analyze the relationship between topology and dynamics in neural networks such as ([1]). 
A standard procedure for the analysis of nonlinear dynamical systems starts with a linear 
analysis. This holds e.g. for stability and bifurcation analysis but also for the computation 
of indicators such as Lyapunov exponents. The key object for this analysis is the Jacobian 
matrix. In our case, it writes: 

DFx = A(u)W, (6) 



with: 



(7) 



Interestingly enough, the Jacobian matrix generates a graph structure that can be inter- 
preted in causal terms (see Appendix [F] for more details). Applying a small perturbation 6j 
to Xj, the induced variation on Xi is given, to the hnear order, by f'{ui)Wij6j. Therefore, 
the induced effect, on neuron z, of a small variation in the state of neuron j is not only 
proportional to the synaptic weight Wij, it also depends on the state of neuron i via /'. For 
example, if \ui\ is very large (neuron "saturation"), /' is very close to and the perturbation 
on any xj has no effect on Xi. 

From this very simple argument we come to the conclusion that the Jacobian matrix displays 
more information than the synaptic weight matrix: 



1. The "causal" graph induced by the Jacobian mat rix leads to the notion of cooperative 



systems, introduc ed by Hirsch in 
genetic networks (iGouze 



1998 



Hirschl . 



Thomas 



19891) and widely studied in the field of 



19811 ). This notion is also useful in the 



present context (see appendix [F|) . 
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2. The Jacobian matrix allows to perform local bifurcation analysis. In our case, this 
provides information about the effect of pattern presentation before and after learning 
(section [IV|) . 

3. The Jacobian matrix allows to define Lyapunov exponents, which are used to measure 
the degree of chaos in a dynamical system. 



4. 



'he Jacobian matrix al 



(ICessac and Sepulchre 



ows to define the notion of linear response in chaotic systems 



2006 



2007 



Ruelle 



19991 ). which extends the notion of causal 



graph to nonlinear systems with chaotic dynamics (see in section ^Vj 



III. DYNAMICAL VIEWPOINT 



As explained in the introduction and reported in (jPauce et q/.I . Il998l ). Hebbian learning 
rules can lead to reduction of the dynamics complexity from chaos to quasiperiodic attractor, 
limit cycle and fixed point, due to the mutual coupling between weights evolution and neuron 
dynamics. The aim of this section is to provide a theoretical interpretation of this reduction 
of complexity for a more general class of Hebbian learning rules than those considered in 



toauce et al. 



19m . 



A. Entropy reduction. 



1. Evolution of the weight matrix. 



From eq. ([2]) it is easy to show by recurrence that: 

T 



N ^ 



(8) 



n=l 



The evolution of the weight matrix under the influence of the generic learning rule eq.(l2]) 
originates from two additive contributions. If A < 1, the "direct" contribution of W^^-* to 
)/y(T+i) grgt term in the RHS member) decays exponentially fast. Hence the effect of 
A is that the initial synaptic structure is progressively forgotten, offering the possibility to 



entirely "rewire" the network in a time scale proportional to 



|log(A)| 



The second RHS term 



of eq. ([8]) corresponds to the new synaptic structure emerging with learning and replacing 
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the initial one (which fades away exponentially fast). Importantly, this second term includes 
contributions from each previous matrices T^^\ \/n <T (with an exponentially decreasing 
contribution A"^~"). Hence, the emerging weights structure depends on the whole history of 
the neuronal dynamics. 

If A < 1, one expects to reach a stationary regime where synaptic weights do not evolve 
anymore: both matrices W^^-* and T^'^^ are expected to stabilize at long learning epochs 
to constant values (limr-*oo W^^^ = and limr^ooTC^) = r(°°)). This means that, if 

A < 1, the dynamics settle at long learning epochs onto a stable attractor that is not modified 
by further learning of a given stimulus. The existence of such a stationary distribution is 
provided by the sufficient condition: 

N{i-\) ■ ^' 

We show in appendix [B] that, assuming moderate hypotheses on h (eq. [3]), ||r('^)|| can be 
upper-bounded, VT, by a constant NC, so that ||W^°°-'|| < aC / (1 — A). From eq.([8]), an 
upper bound for the norm of W*-^^ is trivially found: 

<A^||W«|| + -^f^A^-"||r(")||, (10) 

ra=l 

where |{|{ is the operator norm (induced e.g. by Euclidean norm). Hence, 

llWC^+i)]! < A^||W«l| < A^||W«1| (11) 

1 — A 1 — A 

This result shows that the major effect of the Hebbian learning rule we study may consist 
in an exponentially fast contraction of the norm of the weight matrix, which is due to the 
term A, i.e. to passive forgetting (A < 1). Note also that if A = 1, this term may diverge, 
leading to a divergence of W^^-*. Therefore, in this case, one has to add an artificial cut-off 
to avoid this unphysical divergence. 

These analytical results need not to be "confirmed" by numerical simulations, as they are 
rigorous. However, they only provide an upper bound that can be rough, while simulations 
allows to evaluate how far from the exact values these bounds are. 

Let sf^ be the eigenvalues of W*-"^-*, ordered such that > jsg"^^! > . . . > sf'^ > . . .. 

Since the spectral radius of W^^-*, is smaller than IIW-"^-*!! one has from eq. fllll) : 

|.r+')|<A^||W(^)||+«C-l-. (12) 

1 — A 
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1 10 100 

Learning epoch T 

FIG. 1 The Hebbian learning rule eq. JSjl contracts the spectral radius of W. The evolution during learning of the norm of W 
largest eigenvalue, is plotted on a log-log scale for, from bottom to top, A = 0.80 (squares), 0.90 (circles), 0.95 (triangles) 

or 1.00 (diamonds). Each value is an average over 50 realizations with different initial conditions (initial weights and neuron 
states). Standard deviations are smaller than the symbols. Black full lines are plots of exponential decreases with equation 
3(r) = |sW|A^. 

This equation predicts a bound on the spectral radius that contracts exponentially fast with 
time, under the control of the forgetting rate A. Figure [T] shows the evolution of the spectral 
radius of W^^^ for different values of A during numerical simulations (open symbols). The 
results show that the spectral radius indeed decays exponentially fast. Moreover, we also 
plot on this figure (full lines) exponential decays according to the first RHS member of 
eq. (fT2|) . i.e. g{T) = |s^^'*|A"'". The almost perfect agreement with the measurements tells us 
that the bound obtained in eq. f|T2l) actually represents a very good estimate of the value of 



2. Jacobian matrices. 

Let X G [0, 1]^. A bound for the spectral radius of ^'F^'' can easily be derived from [TT] 
and [61 Call jj,f^\x.) the eigenvalues of DF^^ ordered such that \jj,^^\x.)\ > |yU2"^^(x)| > . . . > 
IfJ'i (x)| > .... One has, Vx: 

|/ir)(x)|<pF?||<||A(u(^))||||W(^)||. (13) 

Since ||A(u('^^)|| = maxj f'{v!'P) (A is diagonal and /' > 0), one finally gets 

|/ir)(x)|<max/'(«f))||W(^)||. (14) 

Therefore, we obtain a bound on the spectrum of DF^ that can be contracted by two 
effects: the contraction of the spectrum of W*^^-' and/or the decay of maxj/'(uj) related to 
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the saturation of neuronal activity. Indeed, f'{ui) is small if Xi is saturated to or 1 (i.e. \ui\ 
is large), but large whenever \ui\ is intermediate, i.e. falls into the central, pseudo-linear part 
of the sigmoid f{ui). We have already evidenced above that A < 1 yields to a decrease of 
II W*-"^-* II . Note that even if A = 1 (no passive forgetting) and W*-"^^ diverges, then u^^-^^ diverges 
as well, leading maxj fiuf' '^) to vanish, thus decreasing the spectral radius of the Jacobian 
matrix. Hence, if the initial value of \^\ (x)| is larger than 1 and the bound in eq. (fT^ 
represents an accurate estimate of |/i[^''(x)|, eq. ffT^ predicts that the latter may decrease 
down to a value < 1. We are dealing here with discrete time dynamical systems, so that 
the value \^\ (x)| = 1 locates a bifurcation of the dynamical system. Hence, eq. lfT^ opens 
up the possibility that learning drives the system through bifurcations. Again, simulations 
(fig. H]) show that the bound obtained in eq. [T3] is indeed very close to the actual value of the 
Jacobian matrix spectral radius. As will be shown later (section HV]) . this point is of great 
importance from a functional viewpoint. 



3. A hound on the maximal Lyapunov exponent. 

Eq. f|T4|) depends on x and cannot provide information on the typical behavior of the 
dynamical system. This information is provided by the computation of the largest Lyapunov 
exponent (see appendix |X] for definitions). In the present setting, the largest Lyapunov 

(T) 

exponent, L\ depends on the learning epoch T. It can be computed exactly b efore learning 



in the thermodynamic limit oo, because Wij^s are i.i.d. random variables (jCessad . ll995l ) 
and it can be showed that it is positive provided g is sufficiently large^. However, because 
the weights deviate from i.i.d. random distribution under the influence of Hebbian learning, 

(T) 

the evolution of L\ cannot be computed analytically as soon as T > 1. Nevertheless, the 
following theorem (proven in appendix [C]) yields a useful upper-bound of L^^^ : 

Theorem 1 

< log(||W(^)||) + (log(max/'(n.))\^^^ . (15) 



In the limit N ^ oo and for random i.i.d. weights with mean and variance j^, \ii'{"\x)\ converges 



Cessac , 


1994; 


Girko , 


1984) 
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where (log(maXj f'{ui))) denotes the time average o/log(maXj f'{ui)), in the learning epoch 
T (see appendix for details). 

(T) 

This theorem emphasizes the two main effects that may contribute to a decrease of L\ . 

(T) 

The first term in the RHS member states that the upper bound on L\ decreases if the norm 
of the weights matrix IIW-"^^!! decreases during learning. The second term is related to the 
saturation of neurons. However, the main difference with eq. (HM is that we now have an 
information on how saturation effects act on average on dynamics, via log(/'). The second 
term in the RHS member is positive if some neurons have an average log(/') larger than 
1 (that is, they are mainly dominated by amplification effects corresponding to the central 
part of the sigmoid) and becomes negative when all neurons are saturated on average. 
In any case, it follows that if learning increases the saturation level of neurons or decreases 
the norm of the weights matrix ||VV^^^ ||, then the result can be a decay of L^i'^ (if the bound 
is a good estimate), thus a possible transition from chaotic to simpler attractors. A canonical 
measure of dynamical complexity is the Kolmogorov-Sinai (KS) entropy which is bounded 
from above by the sum of positive Lyapunov exponents. Therefore, if the largest Lyapunov 
exponent decreases, KS entropy and the dynamical complexity decrease. On numerical 
grounds we observe the following. Fig. [2]A. shows measurements of L^^^ during numerical 
simulations with different values of the passive forgetting rate A. Its initial value is positive 
because we start our simulations with chaotic networks {L^^^ ^ 0.21 ± 0.10). The Hebbian 
learning rule eq.([S]) indeed leads to a rapid decay of L^i'\ whose rate depends on A. Hence 

(T) 

L\ shifts quickly to negative values, confirming the decrease of the dynamical complexity 
that could be inferred from visual inspection of temporal traces of the network averaged 
activity (fig.[2]B). 

(T) 

To conclude, our mathematical framework indicates a systematic decay of L\ induced by 
passive forgetting and/or increased neuronal saturation. This decay explains the decreasing 
dynamical complexity from chaos to steady state that is observed numerically. 

B. Neuron activity. 

We now present analytical results concerning the evolution of individual neuron activ- 
ity. Application of the learning rule eq.([2]) changes the structure of the attractor from one 
learning epoch to the other. The magnitude of this change can be measured by changes 
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FIG. 2 The Hebbian learning rule eq. JSjl induces reduction of the dynamics complexity from chaotic to periodic and fixed 
point. (j4) Evolution of the largest Lyapunov exponent L\ during 100 learning epochs for, from bottom to top, A = 0.80 
(squares), 0.90 (circles), 0.95 (triangles) or 1.00 (diamonds). Each value is an average over 50 realizations with different initial 
conditions (initial weights and neuron states). Bars are standard deviations (and are mostly smaller than symbol size). The 
dashed lines illustrate decays of the form g{T) oc Tlog(A) (see text). {B) Examples of network dynamics when learning is 
stopped at epoch (from bottom to top) T = 1 (initial conditions, chaos), 5 (limit cycle), 6 (simpler limit cycle) or 100 (fixed 
point). These curves show the network-averag ed state (x(^)(t)) = l/A^ E£i a;^' (*) ^nd are shifted on the y-axis for clarity. 
The height of the vertical bar represents an amplitude of 0.1. N = 100 and all other parameters are as in fig. [T] 

in the average value of some relevant observable such as neuron activity (more generally, 
learning induces a variation in the SRB measure p^^-*, see appendix E]) • Let 5p'^"^"'"^)(x) be 
the variation of the average activity x between learning epoch T and T + 1. By definition 
(see appendix 

5p(^+i)(x) = (x)(^+^)-(x)(^). (16) 

We show in appendix [D] that the average value of the neuron local field, u, at learning epoch 
T depends on four additive terms: 

(u)(^+^) = A^(u)(^) + (l-A^)^+Af]A^~"W(")5p("+^)(x) + ^f]A^-"r(")(x)("+^). (17) 

n=l n=l 

Provided that A < 1, as T ^ +oo, time averages of observables converge to a constant. So 
that 5p^'^\'s) — > and lim-r^+oo (x)''"^'* = (x)''°°'*. Therefore, asymptotically: 

(u)(~) = ^ + H(°°), (18) 



16 



where: 

Therefore, the asymptotic local field ((u)*-°°'') is the sum of the stimulus (input pattern) 
plus an additional vector Ji^°°^ which accounts for the history of the system. Note that 
equations f|T8|) . f|T9|) characterize the asymptotic regime T (yo which usually corresponds 
to a fixed-point (see figE]) with hmited dynamical and functional interest (see e.g. fig. Hj). 
On intermediate time scales, eq. f[T7|) must be considered. It shows that the local field u 
contains a constant component (the input pattern) as well as additional (history-dependent) 
terms whose relative contribution cannot systematically be predicted. 
Figure [3] shows numerical simulations of the evolution of the local field u during learning. 
Clearly, while the initial values are random, the local field (thin full line) shows a marked 
tendency to converge to the input pattern (thick dashed line) after as soon as 10 learning 
epochs. The convergence is complete after ^ 60 learning epochs. An additional term 
corresponding to H^'^^ is observed numerically (but is hardly visible in the normalized 
representations of fig. [3]). This last term has an interesting structure in the case of the 
learning rule ([3]). Indeed, in this case: 

A" (1 — A) 



so that: 



where : 



^ mf'jf = Yl (4 . (21) 

j,m\ '>0 j,x\ '>dj 

can be interpreted as an order parameter. A large positive t] means that neurons are mainly 
saturated to 1, while a small rj corresponds to neuron whose average activity is close to cij. 
Note that rj is related to a set of self-consistent equations. Indeed, since Xj = /(wj) one has: 

a 



{/(!<i))<~' - d, (22) 



In the case where this constant asymptotic attractor is a fixed point (i.e. the attractor with 
smallest complexity), one has: 

< = 6 + J^-^^vifiO - rf.), (23) 
17 
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FIG. 3 The local field u = ^ + VVx (thin full line) and the real part of the first eigenvector of the Jacobian matrix (thin 
dotted line) converge to the input pattern ^ (thick dashed line) at intermediate-to-long learning epochs. Snapshot are presented 
at T = 1 (A, initial conditions), T = 10 (B), T = 60 (C) and T = 200 (D) learning epochs. Each curve plots averages over 50 
realizations (standard deviations are omitted for clarity), vectors have been normalized to [0, 1] for clarity. All other parameters 
as in fig. [T] 



where u* and x* denote the values of u and x, respectively, on the fixed point attractor. 
Here, the set of nonlinear self-consistent equations (l22l) includes both a local (uf) and 
a global term (the order parameter t]). Assume that we slightly perturb the system, for 
example by removing the stimulus C,i for some neurone i. If the system ( 122|) is away from 
a bifurcation point, this perturbation is expected to result in only a slight change in u*. 
Alternatively, if a bifurcation occurs, a dramatic change in u* can take place. This local 
modification of activity may in turn yield a big change in 77, which corresponds to a global 
(i.e. network-wide) modification of activity, through a some avalanche-like mechanism. On 
practical grounds this means that presentation or removal of some parts of the input pattern 
may induce a drastic change of the dynamics of the network. 
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IV. FUNCTIONAL VIEWPOINT 



Pattern recognition is one of the functional properties of RRNNs. In our terms, a pattern 
is "learned" when its presentation (or removal) induces a bifurcation ^. Moreover, this effect 
must be acquired via learning, selective (i.e. only the presented pattern is learned) and 
robust (i.e. a noisy version of the learned pattern should lead to an attractor similar to the 
one reached after presentation of the learned pattern). We now proceed to an analysis of the 
effect of pattern removal, as a simple indicator of the functional properties of the network. 
A deeper investigation of the functional properties of the network is out of the scope of the 
present study and will be the subject of future works. 

Label by x (resp. u) the neuron firing rate (resp. local field) obtained when the (time 
constant) input pattern ^ is applied to the network (see eq. [1]) and by x' (resp. u') the 
corresponding quantities when ^ is removed (^ = 0). The removal of $, modifies the attractor 
structure and the average value of any observable (though the amplitude of this change 
depends on 0). More precisely call: 



(0(x'))(^)-(0(x)) 



(T) 



(24) 



where (0(x'))*'^'' is the (time) average value of without ^ and (0(x))^^'' the average value 
in the presence of ^. Two cases can arise. 

In the first case, the system is away from a bifurcation point and removal results in a 
variation of A*^^-* [(f)] that remains proportional to $, provided ^ is sufficiently small (remember 
here that the present network admits a single attractor at a given learning epoch). Albeit 
common for non-chaotic dynamics, we emphasize that this statement still holds for chaotic 
dynamics. This has been rigorously proven for uniformly h yperbolic systems, thanks to 



the linear response theory developed by Ruelle (jRuelld . Il999l ). In the present context, the 



l inear response theory predicts th at the variation of the average value of u is given by 



( ICessac and Sepulchre 



2006 



20071 ): 



A(^) [u] 



(25) 



This i dea, as well as the preceding works of the authors on this topic was deeply influenced by Freeman's 



work ( Freeman 



1987 



Freeman et al 



1988!) 
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where 

oo 

X^^^ = J2{DF-r^ (26) 

n=0 

is a matrix^, whose entries can be written: 

+00 n n \ (T) 

= ^ + E E n ^'^^'^-^ ( n /'(^^^-i - 1)) ) (27) 

n=l7,j{n)/=l \l=l I 

where the sum holds on every possible path 'jijin) of length n, connecting neuron 

ko = j to neuron /c„ = i, in n steps. 

Note therefore that A^^^ [u] = -| - M^^)^ where the matrix M^^) = Y.n=i (^F")^"^^ inte- 
grates dynamical effects. A slight variation of at t = implies a reorganization of the 
dynamics which results in a complex formula for the variation of (u)^"^-*, even if the domi- 
nant term is as expected. More precisely, as emphasized several times above, one remarks 
that each path in the sum J2'yij{n) weighted by the product of a topological contribution 
depending only on the weights Wij and on a dynamical contribution. The weight of a path 
'jij depends on the average value of (nr=i fi^h^-ii^ ~ 1))) ^^^^ correlations between 
the state of saturation of the units ko, . . . , kn-i at times 0, . . . , n — 1. 

Eq. [25] shows how the effects of pattern removal are complex when dealing with a chaotic 
dynamics. However, the situation is much easier mathematically in the simplest case where 
dynamics have converged to a stable fixed point u**^-^) (resp. x**^^^). In this case, eq. (l25ll 
reduces to: 

oo 

A(^)[u] = -5^(W(^)A(u*))"^ (28) 

n=0 

Calling Afc, Vfc the eigenvalues and eigenvectors of >V*-'^''A(u*^"^^), ordered such that |AAr| < 
IAat-iI < I All < 1 one obtains: 

A(-Mu] = -f:^^v. (29) 



6 



The convergence of this series is discussed in (jCessac and Sepulchre! . 12004120061 : iRuelld . 119991 ). Note that 



a similar formula can be written for an arbitrary observable ^, but is more cumbersome. 
^ Incidentally, this equation shows once again why the synaptic weight matrix is not sufhcient to capture 
the dynamical effects of a perturbation. Indeed, it contains a purely topological term (OILi W^fcjfei-i) and 
also depends on a "purely dynamical" term (nr=i /'(""^i-i ~ 1)))'^"'"^ that involves an average of the 
derivative of the transfer functions along the orbit of the neural network. 
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where ( , ) denotes the usual scalar product. Actually, this result can easily be found 
without using linear response, by a simple Taylor expansion (see appendix [E]). The response 
is then proportional to $, but becomes arbitrary large when Ai tends to 1 and provided 
that (vi,^) > 0. This analysis can be formally extended to the general case (i.e. including 
chaos, eq. f2E\i but is delicate enough to deserve a treatment by its own and will be the scope 
of a forthcoming paper^. Here, we simply want to make the following argument. From 
the analysis above, we expect pattern removal to have a maximal effect at "the edge of 
chaos", namely when the (average) value of the spectral radius^ of DF^ is close to 1. As 
mentioned above, the effects are however more or less prominent according to the choice of 
the observable 0. We empirically found that the effects were particularly prominent with 
the following quantity: 



1 ^ 2 

^^"^[^] = ^\ E - (A.(uO)^"^) (30) 

\ i=i 

Indeed, An = f'{ui) is maximal when the local field of i falls in the central pseudo-linear 
part of the transfer function, hence where neuron i is the most sensitive to its input. Hence 
A'^"^)[A] measures how neuron excitability is modified when the pattern is removed. The 
evolution of A(^^[A] during learning following rule eq.Q is shown on fig. H] (full lines) for 
two values of the passive forgetting rate A. A^^)[A] is found to increase to a maximum at 
early learning epochs, while it vanishes afterwards. Interestingly, comparison with the decay 
of the leading eigenvalue of the Jacobian matrix, yUi (dotted lines) shows that the maximal 
values of A'^"^)[A] are obtained when = |Ai| is close to 1. Hence, these numerical 
simulations confirm that sensitivity to pattern removal is maximal when the leading 
eigenvalue is close to 1. Therefore, "Hebb-like" learning drives the global dynamics through 



(T) 

This can be achieved by formally "diagonalizing" the matrices (DF")^ ' but the problem is that eigen- 
values Afe(n) and eigenvectors Vfc(n) now depend on the time n. Information about the time dependence 
of the spectrum can be fou nd using the Fourier transform of the matrix x looking for its poles 



(jCessac and Sepulchrel . 120061 ). These poles are closely related to t he graph struct ure induced by the Jaco 



bian matrices, by standard traces formula and cycle expansions (jGaspardl . Il998( ). Essentially, we expect 
that, under the effect of learning, the leading resonances move toward the real axis leading to a singularity 
at the edge of chaos. The motion should be closely related to the reinforcement of feedback loops discussed 
in appendix IF! 

There is a subtlety here. We have DFx = A(u)W, while in formula ([29|) we consider the eigenval- 
ues of yyA(u). However, if Afc,Vfc are eigenvalues and eigenvectors of WA{u) then A{u)WA{u)vk = 
I?FxA(u)vfc — AfcA(u)vfc. Therefore, Afc, A(w)vfc are eigenvalues and eigenvectors of -DFx- 
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FIG. 4 The network sensitivity to the input pattern is maximal close to a bifurcation. The evolution of the average value 
for the spectral radius of DF^' during learning (dotted line) is plotted together with the sensitivity measure A^-^' [A] (full 
line) for A = 0.80 {A) or 0.90 (B). The panels also display the corresponding evolution of the largest Lyapunov exponent Li, 
plotted as 1.0 + Li for obvious comparison purpose (dashed line). The values of A(-^'[A] arc normalized to the [0 — 1] range 
for comparison purposes. Each value is an average over 50 realizations (standard deviations are omitted for clarity). All other 
parameters were as in fig. [T] 

a bifurcation, in the neighborhood of which sensitivity to the input pattern is maximal. This 
property may be crucial regarding memory properties of RRNNs, which must be able to 
detect, through their collective response, whether a learned pattern is present or absent. 
This property is obtained at the frontier where the strange attractor begins to destabilize 
= 1), hence at the so-called "edge of chaos". 
We showed in section IIII.AI that the Hebbian learning rules studied here contract the 
spectral radius of DFx,Vx, so that the latter crosses the value 1 at some learning epoch. 
Thus, 1 is ensured to be an eigenvalue of -DFx at some point . The evolution of fi, the 
eigenvector associated to the leading eigenvalue of the Jacobian matrix /ii, is less obvious. 
We plot on fig. [3] (dotted lines) the evolution of its real part during numerical simulations 
(actually, its imaginary part vanishes after just a couple of learning epochs). It is clear from 
numerical simulations that the possibility of a vanishing projection of the input pattern 
^ (thick dashed line) on fi can be ruled out (the two vectors are not orthogonal). The 
tendency is even opposite, i.e. Vi is found to align on the input pattern at long learning 
epochs (T > 100; note that we were not able to find a satisfactory explanation for this 
alignment). 
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V. DISCUSSION 



The coupled dynamical system studied in the present paper (eqs.([T]) and ([2])) is based 
on several simplifying assumptions that allowed the rigorous mathematical study we have 
presented. However, many of the results we obtain remain valid when some of these assump- 
tions are relaxed to improve biological realism. Here, we give a brief overview of the related 
arguments. As already stated in the introduction, we do not pretend to en compass the spec- 
trum of complexity and richness of biological learning and plasticity rules (IKim and Lindenl . 



20071 ). However, the present study focuses on the major type of synaptic plasticity (i.e Heb- 



bian plasticity), which is generally considered as the principal cellular basis of behavioral 
learning and memory. 

The learning rule we study here eq.([2]) includes a term that allows passive forgetting (A < 1). 
This possibility is supported by a body of experimental data that shows that synaptic weights 
decay exponentially toward their baseline after LTP, in the absenc e of subsequent homo- 



or hetero-synaptic LTD, with time constants from seconds to days ( lAbraham et al. 



2002 



Brager et al. 



1994, 



20031 ). A plausible molecular mechanism for this passive behavior has 



been recently proposed, which relies on the operation of kinase and phospha tase cycles that 



are systematically implicated in learning and memory (jPelord et al. 



20071 ). Our theoreti- 



cal results predict that learning-induced reduction of dynamics complexity can still arise in 
the limit case of A = 1. Indeed, numerical simulations of Hebbian learning rules devoid of 
passive forgetting (i.e. with A = 1) have clearly evidenced a red uction of the attractor com- 



plexity during learning (IBerry and Quoyl . 



2006 



Siri et al. 



20061 ). In this case, the reduction 



of the attractor complexity is provoked by an increase of the average saturation level of the 
neurons, in agreement with our present analytical results. As a matter of fact, the question 
is not so much to know what exactly is the value of A in real neural networks, but how the 



characteristic time scale 



1 



log{A)| 



compares to other time scales in the system. 



Another assumption of the generic Hebbian rule we study is that Tij = whenever the 
presynaptic neuron is silent. As already mentioned section III. At an interpretation of this 
assumption is that this learning rule excludes heterosynaptic LTD. To assess the impact of 
this form of synaptic depression in the model, we ran numerical simulations using a variant 
of eq.([5]) in which the Heavyside term (that forbids heterosynaptic LTD) was omitted. The 
results of these simulations (not shown) were in agreement with all the analytical results 
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supported here, including those on spectral radius contraction. In agreement with these 
numerical simulations, our analytical results on the contraction of the spectral radius are 
expected to remain valid when heterosynaptic LTD is accounted for, but this would require 
extending the model definition and further mathematical developments that are out of the 
scope of the present study. 

The effects of Hebbian learning were studied here in a completely connected, one population 
(i.e. each neuron can project both excitatory and inhibitory synapses) chaotic network. 
While this hypothesis allows a rigorous mathematical treatment, it is clearly a strong ide- 
alization of biological neural networks. However, we have tested the analytical predictions 
obtained here with numerical simulations of a chaotic recurrent neural network with con- 
nectivity mimicking cortical micro-circuitry, i.e. sparse connectivity and distinct excitatory 
and inhibitory neuron populations. These simulations unambiguous ly demons t rated that 



our analytical results are still valid in these more realistic conditions (jSiri et all 120071 ). 
From a functional point of view, we have shown that the sensitivity to the learned pattern 
is maximal at the edge of chaos. Starting from chaotic dynamics, this regime is reached 
at intermediate learning epochs. However, longer learning times result in poorer dynamical 
regimes (e.g. fixed points) and the loss of s ensitivity to the l earne d pattern. Additional 



plast icity mechanisms like synapt ic scaling (ITurrigiano et all Il998l ) or intrinsic plasticity 



ref ( iDaoudal and Debannd . l2003l ) may constitute interesting biological processes to main- 
tain the network in the vicinity of the edge of chaos and preserve a state of high sensitivity 
to the learned pattern. Such possibilities are currently under investigation in our group. 
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APPENDIX A: Definitions. 

Dealing with chaotic systems, one is faced with the necessity to defining indicators mea- 
suring dynamical complexity. There are basically two families of indicators: one is based on 
topological properties (e.g. topological entropy), the other is based on statistical properties 
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(e.g. Lyapunov exponents or Kolmogorov- Sinai entropy). The latter family can easily be 
accessed numerically or experimentally by time averages of relevant observables along typi- 
cal trajectories of the dynamical system. However, to this aim, one has to assume a strong 
ergodic property: the time average of observables, along trajectories corresponding to initial 
conditions drawn at random with respect to a probability distribution having a density (with 
respect to the Lebesgue measure), is constant (it does not depend on the initial condition). 
This property is far from being evident. Actually, we are not able to prove it in the present 
context. On mathematical grounds, it corresponds to the following assumption. 

Assumption 1 Call fi^ is the Lebesgue measure on [0, 1]^ and let F'^fiL the image of fi^ 
under . We assume that the following limit exists: 



hm -Vi^W 

t=l 



(Al) 



where the pro bability measure p^^^ i s called "the Sinai-R uelle-Bowen (SRB) measure at learn 



1973; 



Ruelk . 



1978^ : 



Sinai . 



197m)- Under this assumption the following 



ing epoch T " /(Boweri . 

holds. Let : [0, 1]^ ^ be some suitable (measurable) function. Then the time average: 



a;(^)(0)] = hm -y0(a;(^)(t)). 



(A2) 



where x(t) = I^{x), is equal to the ensemble average: 



for Lebesgue-almost every initial condition x^'^\0). 



(A3) 



In other words, time average and ensemble average are identical on practical grounds. 
The use of p^'^^ is required to prove the mathematical results below while time average is 
what we use for numerical simulations. 

Note that in doing so, we have constructed a family of probability distributions p'-"^-' that 
depends on the learning epoch T. p^'^^ provides statistical information about the attractor 
structure. A prominent example is the maximal Lyapunov exponent. Let x G [0, 1]^, 
V G and p be an SRB measure. Then, the largest Lyapunov exponent is given by: 
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L^P = lim lim - log ( ^^^^4^1 (A4) 

^ t^oo||v|H0t V l|v|| / 

Its value is constant for p*^"^) almost every x. (Note indeed that the LHS does not depend 
on X, while the RHS does. This is a direct consequence of the assumption that p^^-* is an 
SRB measure). 



APPENDIX B: Asymptotic behaviors 

In the specific learning rule eq.Q used in our numerical simulations, Tij = mimjH{mj). 
Thus 

liril = s^P.^ (Bi) 

= sup j'^["^y^"" (B2) 

< ||m|||| [mi7(m)]+|| (B3) 

N \ 1/2 / ^ \ V2 

.4=1 / y=i,m^>o J 

< A^v^ (B6) 

where [v]"*" denotes the transpose of vector v, 'Yl,j=imj>o denotes a sum restricted to the 
active neurons and (j) is the fraction of active neurons. Hence 

||r(^)|| < A^v^ (B7) 

If (as observed in our numerical simulations) c/)^^-* tends to a stationary value 0*^°°-' then 



< 



||r(^)|| < A^V0M (B8) 

Hence F is bounded in the specific case of eq.([S]) by a constant N\/ (f)^°°\ 

More generally, ||r|| is bounded provided that the function /i in ([3]) is bounded as well. 



APPENDIX C: Proof of theorem [D 

Let v,x G R^. Denote by x(t) = F*(x), and v(t) = DY^it)-DY^^^ v(0) = v. From 
the chain rule: 
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DF^vll ||DFx(*)v(t- 1)11 ||v(t- 1)11 



l|v|| l|v(t-l)|| HI 

|DFx(t)v(t - 1)11 ||DFx(i-i)v(t - 2)11 pFx(i)v| 



l|v(t-l)|| ||v(t- 

Therefore: 



Li^'* = lim lim - loe ( - 



^Fx(„)v(n 



n=l 

Since pv|| < p||||v|| : 

t 



|v(n-l)|| 



Lf) < lim ^ J]log(||DFx(„)||) = (log(llDFxll))^'^^ p(^) - almost surely. 

t- >oo 7- 

n=l 

But since DFx = A(u)W, we have Hl^FxH < ||>V||||A(u)|| < || W|| maxi(f (m^). 



APPENDIX D: Local fields 

Fix X and the time epoch T. Set u = W'-^-'x + ^. The average of u, (u)*-"^'* is defined 
either by the time average flA2p or by the ensemble average flASp . However, since W*-"^^ is 
constant during a given learning epoch one has: 

(u)(^) = W(^)(x)(^) + ^, VT. (Dl) 

Therefore: 



^ ^(T+l) ^ ^ ^ (^^(T) ^ ^r(^))((x)(^) + 5p(^+^)(x)) + I, 

where 5p*-^^^''(x) =^ (x)*-^^^'' — (x)*-"^^ is the difference of the average value of x between 
learning epochs T + 1 and T. 
Thus: 



(u)(^+^) = A {uf^ + (1 - A)^ + A>V(^)5p(^+^)(x) + ^r(^) (x)(^+^) 
and by recurrence: 



T T 

A^(u)(i) + (l-A^)^+A^A^-">V(")5p("+i)(x) + ^$^A^-"rW(x)("+^) (D2) 



AT 

ra=l n=l 
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APPENDIX E: Proof of eq. ([29]) 



Call u**^'^) (u'**^'^^) the fixed point (for the variable u) with (without) ^. We have: 



and: 



u*(^) = WF(u*(^))+^ 

Therefore: 

A series expansion yields, to the linear order: 

(J-WA(u(^)))5u(^) = 
Decomposing on the eigenbasis of >VA(u*^^)) we obtain: 

(l-A,)(W^),Vfc) = -(|,Vfc) (El) 

which corresponds to eq. (l29l) provided \\k\ < 1 (ensuring that the matrix I — >VA(u^^^) is 
invertible). 

APPENDIX F: Jacobian matrix and feedback loops background 

Assume that we slightly perturb at time t the state of neuron j with a small perturbation 
(e.g. Xj{t) Xj{t) + Sj{t)). Then the effect of this change on neuron i, at time t + 
1 is given by Xi{t + 1) = / (^^=1 ^ikXkit) + + ^ij^jit)^- One can perform a Taylor 
expansion of this expression in powers of Wij6j{t). To the linear order the effect is given 
by f'{ui{t))Wij6j{t). To each Jacobian matrix -DFx one can associate a graph, called "the 
graph of linear influences" . such that there is an oriented edge j — > i iff 7^ 0. The 

edge is positive if > and negative if < 0. An important remark is that this 

graph depends on the current state x, contrarily to the weights matrix which is a constant 
inside a given learning epoch. This has important consequences. Indeed, in our case since 
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= f'{ui)Wij, the edge j ^ i in the graph of hnear influences can be very small even if the 
synaptic weight Wy is large. It suffices that \ui\ be large. This effect, due to the saturation 
of the transfer function /, is prominent in the subsequent studies. 

We have now the following situation: "above" (in the tangent bundle) each point x, there 
is graph. This graph contains circuits or feedback loops. If e is an edge, denote by o(e) 
the origin of the edge and t(e) its end. Then a circuit is a sequence of edges ei, such 
that o(ej+i) = t(ej), Wi = l...k — 1, and t{ek) = o(ei). Such a circuit is positive (negative) 
if the product of its edges is positive (negative). A positive circuit basically yields (to the 
linear order) a positive feedback that induces an increase of the activity of the neurons in 
this circuit. Obviously, there is no exponential increase since rapidly nonlinear terms will 
saturate this effect. It is thus expected that positive loops enhance stability. 

A particularly prominent example of this is well known in the framework of continu- 
ous time neural networks models and also in genetic networks. It is provided by so-called 
"cooperative systems". A dynamical system is called cooperative if ^^^(x) > 0,V2 7^ j. 
Therefore, in this case, all edges are positive edges^°, whatever the state of the neural net- 
work and all circuits are positive. Cooperative systems preserve the following partial order 
^ < y Xi < Hi, i = 1 . . . N. Thus x(0) < y(0) =^ x(t) < y(t), Vt > (th is corre s ponds 



to the positive feedback discussed above). From these inequalities, Hirsch (jHirschl . Il989l ) 
proved that for a two dimensional cooperative dynamical system, any bounded trajectory 
converges to a fixed point. In larger dimension, one needs moreover a technical condition 
on the Jacobian matrix: it must be irreducible. Then Hirsch proved that the tu-limit set of 
almost every bounded trajectory is made of fixed points. Note that this result holds when 
/ is nonlinear. 



On the opposite, negative 



Thomas conjecture (IThomas 



oops usually generate os cillations. Fo r example, the second 



19811 ) ■ proved by Gouze (jGouzd 119981 ) under the hypothesis 
that the sign of the Jacobian matrix elements do not depend on the state, states that 
"A negative loop is a necessary condition for a stable periodic behavior". In our model, 
negative loop generate oscillations provided that the nonlinearity g is sufficiently large. This 
can be easily figured out by considering a system with 2 neurons. A necessary condition to 



^'^ More generally, there is a variable change which maps the initial dynamical system to a cooperative system 
with positive edges. 
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have a Hopf bifurcation giving rise to oscillations is W12W21 < 0, but the bifurcation occurs 
only when g is large enough. 
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FIG. 1 The Hebbian learning rule eq. (5) contracts the spectral radius of VV. The evolution during learning of the norm of W 
largest eigenvalue, \s\ \ is plotted on a log-log scale for, from bottom to top, A = 0.80 (squares), 0.90 (circles), 0.95 (triangles) 
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Oh or 1.00 (diamonds). Each value is an average over 50 realizations with different initial conditions (initial weights and neuron 
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states). Standard deviations are smaller than the symbols. Black full lines are plots of exponential decreases with equation 
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FIG. 2 The Hebbian learning rule eq. (5) induces reduction of the dynamics complexity from chaotic to periodic and fixed 
point. {A) Evolution of the largest Lyapunov exponent Li during 100 learning epochs for, from bottom to top, A = 0.80 
(squares), 0.90 (circles), 0.95 (triangles) or 1.00 (diamonds). Each value is an average over 50 realizations with different initial 
conditions (initial weights and neuron states). Bars are standard deviations (and are mostly smaller than symbol size). The 
dashed lines illustrate decays of the form g{T) oc Tlog(A) (see text). (S) Examples of network dynamics when learning is 
stopped at epoch (from bottom to top) T = 1 (initial conditions, chaos), 5 (limit cycle), 6 (simpler limit cycle) or 100 (fixed 
point). These curves show the network-averaged state {x^'^\t)'^ = IZ-'V ^^^^ a;^"^^ (t) and arc shifted on the y-axis for clarity. 
The height of the vertical bar represents an amplitude of 0.1. N = 100 and all other parameters are as in fig. 1. 
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FIG. 3 The local field u = ^ + Wx (thin full line) and the real part of the first eigenvector of the Jacobian matrix (thin 
dotted line) converge to the input pattern ^ (thick dashed line) at intermediate-to-long learning epochs. Snapshot are presented 

at T = 1 {A, initial conditions), T = 10 (B), T = 60 (C) and T = 200 (D) learning epochs. Each curve plots averages over 50 
realizations (standard deviations are omitted for clarity), vectors have been normalized to [0, 1] for clarity. All other parameters 
as in fig. 1 
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FIG. 4 The network sensitivity to the input pattern is maximal close to a bifurcation. The evolution of the average value 
for the spectral radius of DF^^ during learning (dotted line) is plotted together with the sensitivity measure A^^) [A] (full 
line) for A = 0.80 (A) or 0.90 (B). The panels also display the corresponding evolution of the largest Lyapunov exponent L\, 
plotted as 1.0 + Li for obvious comparison purpose (dashed line). The values of A(-^'[A] are normalized to the [0 — 1] range 
for comparison purposes. Each value is an average over 50 realizations (standard deviations are omitted for clarity). All other 
parameters were as in fig. 1 
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